213 research outputs found
Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition from Continuous Speech Signals
Human infants can discover words directly from unsegmented speech signals
without any explicitly labeled data. In this paper, we develop a novel machine
learning method called nonparametric Bayesian double articulation analyzer
(NPB-DAA) that can directly acquire language and acoustic models from observed
continuous speech signals. For this purpose, we propose an integrative
generative model that combines a language model and an acoustic model into a
single generative model called the "hierarchical Dirichlet process hidden
language model" (HDP-HLM). The HDP-HLM is obtained by extending the
hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by
Johnson et al. An inference procedure for the HDP-HLM is derived using the
blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure
enables the simultaneous and direct inference of language and acoustic models
from continuous speech signals. Based on the HDP-HLM and its inference
procedure, we developed a novel double articulation analyzer. By assuming
HDP-HLM as a generative model of observed time series data, and by inferring
latent variables of the model, the method can analyze latent double
articulation structure, i.e., hierarchically organized latent words and
phonemes, of the data in an unsupervised manner. The novel unsupervised double
articulation analyzer is called NPB-DAA.
The NPB-DAA can automatically estimate double articulation structure embedded
in speech signals. We also carried out two evaluation experiments using
synthetic data and actual human continuous speech signals representing Japanese
vowel sequences. In the word acquisition and phoneme categorization tasks, the
NPB-DAA outperformed a conventional double articulation analyzer (DAA) and
baseline automatic speech recognition system whose acoustic model was trained
in a supervised manner.Comment: 15 pages, 7 figures, Draft submitted to IEEE Transactions on
Autonomous Mental Development (TAMD
Unsupervised Phoneme and Word Discovery from Multiple Speakers using Double Articulation Analyzer and Neural Network with Parametric Bias
This paper describes a new unsupervised machine learning method for
simultaneous phoneme and word discovery from multiple speakers. Human infants
can acquire knowledge of phonemes and words from interactions with his/her
mother as well as with others surrounding him/her. From a computational
perspective, phoneme and word discovery from multiple speakers is a more
challenging problem than that from one speaker because the speech signals from
different speakers exhibit different acoustic features. This paper proposes an
unsupervised phoneme and word discovery method that simultaneously uses
nonparametric Bayesian double articulation analyzer (NPB-DAA) and deep sparse
autoencoder with parametric bias in hidden layer (DSAE-PBHL). We assume that an
infant can recognize and distinguish speakers based on certain other features,
e.g., visual face recognition. DSAE-PBHL is aimed to be able to subtract
speaker-dependent acoustic features and extract speaker-independent features.
An experiment demonstrated that DSAE-PBHL can subtract distributed
representations of acoustic signals, enabling extraction based on the types of
phonemes rather than on the speakers. Another experiment demonstrated that a
combination of NPB-DAA and DSAE-PB outperformed the available methods in
phoneme and word discovery tasks involving speech signals with Japanese vowel
sequences from multiple speakers.Comment: 21 pages. Submitte
The effect of varying sound velocity on primordial curvature perturbations
We study the effects of sudden change in the sound velocity on primordial
curvature perturbation spectrum in inflationary cosmology, assuming that the
background evolution satisfies the slow-roll condition throughout. It is found
that the power spectrum acquires oscillating features which are determined by
the ratio of the sound speed before and after the transition and the
wavenumeber which crosses the sound horizon at the transition, and their
analytic expression is given. In some values of those parameters, the
oscillating primordial power spectrum can better fit the observed Cosmic
Microwave Background temperature anisotropy power spectrum than the simple
power-law power spectrum, although introduction of such a new degree of freedom
is not justified in the context of Akaike's Information Criterion.Comment: 12 pages, 3 figures; references added; appendix modifie
LiDAR Data Synthesis with Denoising Diffusion Probabilistic Models
Generative modeling of 3D LiDAR data is an emerging task with promising
applications for autonomous mobile robots, such as scalable simulation, scene
manipulation, and sparse-to-dense completion of LiDAR point clouds. Existing
approaches have shown the feasibility of image-based LiDAR data generation
using deep generative models while still struggling with the fidelity of
generated data and training instability. In this work, we present R2DM, a novel
generative model for LiDAR data that can generate diverse and high-fidelity 3D
scene point clouds based on the image representation of range and reflectance
intensity. Our method is based on the denoising diffusion probabilistic models
(DDPMs), which have demonstrated impressive results among generative model
frameworks and have been significantly progressing in recent years. To
effectively train DDPMs on the LiDAR domain, we first conduct an in-depth
analysis regarding data representation, training objective, and spatial
inductive bias. Based on our designed model R2DM, we also introduce a flexible
LiDAR completion pipeline using the powerful properties of DDPMs. We
demonstrate that our method outperforms the baselines on the generation task of
KITTI-360 and KITTI-Raw datasets and the upsampling task of KITTI-360 datasets.
Our code and pre-trained weights will be available at
https://github.com/kazuto1011/r2dm
Computing Runs on a Trie
A maximal repetition, or run, in a string, is a maximal periodic substring whose smallest period is at most half the length of the substring. In this paper, we consider runs that correspond to a path on a trie, or in other words, on a rooted edge-labeled tree where the endpoints of the path must be a descendant/ancestor of the other. For a trie with n edges, we show that the number of runs is less than n. We also show an O(n sqrt{log n}log log n) time and O(n) space algorithm for counting and finding the shallower endpoint of all runs. We further show an O(n log n) time and O(n) space algorithm for finding both endpoints of all runs. We also discuss how to improve the running time even more
3次元画像の高画質化・高機能化に向けた解像度変換処理の研究
学位の種別:課程博士University of Tokyo(東京大学
- …